Phase Transitions for High Dimensional Clustering and Related Problems
نویسندگان
چکیده
Consider a two-class clustering problem where we observe Xi = `iμ + Zi, Zi iid ∼ N(0, Ip), 1 ≤ i ≤ n. The feature vector μ ∈ R is unknown but is presumably sparse. The class labels `i ∈ {−1, 1} are also unknown and the main interest is to estimate them. We are interested in the statistical limits. In the two-dimensional phase space calibrating the rarity and strengths of useful features, we find the precise demarcation for the Region of Impossibility and Region of Possibility. In the former, useful features are too rare/weak for successful clustering. In the latter, useful features are strong enough to allow successful clustering. The results are extended to the case of colored noise using Le Cam’s idea on comparison of experiments. We also extend the study on statistical limits for clustering to that for signal recovery and that for global testing. We compare the statistical limits for three problems and expose some interesting insight. We propose classical PCA and Important Features PCA (IF-PCA) for clustering. For a threshold t > 0, IF-PCA clusters by applying classical PCA to all columns of X with an L-norm larger than t. We also propose two aggregation methods. For any parameter in the Region of Possibility, some of these methods yield successful clustering. We discover a phase transition for IF-PCA. For any threshold t > 0, let ξ be the first left singular vector of the post-selection data matrix. The phase space partitions into two different regions. In one region, there is a t such that cos(ξ, `)→ 1 and IF-PCA yields successful clustering. In the other, cos(ξ, `) ≤ c0 < 1 for all t > 0. Our results require delicate analysis, especially on post-selection Random Matrix Theory and on lower bound arguments.
منابع مشابه
High-Dimensional Unsupervised Active Learning Method
In this work, a hierarchical ensemble of projected clustering algorithm for high-dimensional data is proposed. The basic concept of the algorithm is based on the active learning method (ALM) which is a fuzzy learning scheme, inspired by some behavioral features of human brain functionality. High-dimensional unsupervised active learning method (HUALM) is a clustering algorithm which blurs the da...
متن کاملساختار فاز میدانهای پیمانهای شبکهای دو بعدی U(N) با کنش مختلط
We study the phase structure of two dimensional pure lattice gauge theory with a Chern term. The symmetry groups are non-Abelian, finite and disconnected sub-groups of SU(3). Since the action is imaginary it introduces a rich phase structure compared to the originally trivial two dimensional pure gauge theory. The Z3 group is the center of these groups and the result shows that if we use one ...
متن کاملObserved Universality of Phase Transitions in High-Dimensional Geometry, with Implications for Modern Data Analysis and Signal Processing
We review connections between phase transitions in high-dimensional combinatorial geometry and phase transitions occurring in modern high-dimensional data analysis and signal processing. In data analysis, such transitions arise as abrupt breakdown of linear model selection, robust data fitting or compressed sensing reconstructions, when the complexity of the model or the number of outliers incr...
متن کاملRobust high-dimensional semiparametric regression using optimized differencing method applied to the vitamin B2 production data
Background and purpose: By evolving science, knowledge, and technology, we deal with high-dimensional data in which the number of predictors may considerably exceed the sample size. The main problems with high-dimensional data are the estimation of the coefficients and interpretation. For high-dimension problems, classical methods are not reliable because of a large number of predictor variable...
متن کاملGCMC Glauber dynamics study for structural transitions in YBCOx (0<x<1), HTc system
We have chosen an Ising ASYNNNI (ASYmmetric Next Nearest Neighbor Interaction) model under a grand canonical regime to investigate structural phase transition from a high symmetric tetragonal (Tet) to a low symmetric orthorhombic in YBa2Cu3O6+x , 0<x<1, HTc system. Ordering process for absorbed oxygens from an external gas bath into the basal plane of the layered system is studied by Monte C...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015